The data are from a 2017 FreeCodeCamp survey targeted towards new coders who have less than 5 years of experience. The data presented here are limited to the top 10 countries with the most survey respondents. Most respondents from these countries are male and less than 30 years old (with a peak at age 25) indicating interest among the younger generation. The largest median amount in USD of money spent per month programming is attributed to the United States with an amount of approximately $30. These values consider money spent on both bootcamp and non-bootcamp courses. The interest and completion of bootcamps is an area could be further explored.
---
title: "FreeCodeCamp 2017 Survey Summary"
output:
flexdashboard::flex_dashboard:
orientation: columns
vertical_layout: fill
source_code: embed
---
```{r setup, include=FALSE}
library(flexdashboard)
```
Column {data-width=350}
-----------------------------------------------------------------------
### Summary
The data are from a 2017 FreeCodeCamp survey targeted towards new coders who have less than 5 years of experience. The data presented here are limited to the top 10 countries with the most survey respondents. Most respondents from these countries are male and less than 30 years old (with a peak at age 25) indicating interest among the younger generation. The largest median amount in USD of money spent per month programming is attributed to the United States with an amount of approximately $30. These values consider money spent on both bootcamp and non-bootcamp courses. The interest and completion of bootcamps is an area could be further explored.
### Gender Distribution
```{r, fig.height = 10, fig.width=10}
library(tidyverse)
survey_df <- read_csv("2017-fCC-New-Coders-Survey-Data.csv",
col_types = cols_only('Age' = col_double(),
'AttendedBootcamp' = col_integer(),
'BootcampName' = 'c',
'BootcampFinish' = 'c',
'CountryLive' = 'c',
'Gender' = 'c',
'MoneyForLearning' = col_double(),
'MonthsProgramming' = col_double()
))
#MoneyForLearning describes in USD, the amount of money spent by participants from the moment they started coding
#until they completed the survey
#calculate money spent per month of programming across all those who participated in the survey
#limit analysis to United States of America, India, United Kingdom, Canada, Brazil, Germany, Poland, Russia, Australia, France
#these are the top 10 countries in terms of survey response
countries = c("United States of America", "India", "United Kingdom", "Canada", "Brazil", "Germany",
"Poland", "Russia", "Australia", "France")
gender_df <- survey_df %>% filter(CountryLive %in% countries, !(is.na(Gender))) %>% group_by(Gender) %>% count()
total_counts <- sum(gender_df$n)
gender_df <- gender_df %>% mutate(percent = n / total_counts * 100)
blank_theme <- theme_minimal()+
theme(
axis.title.x = element_blank(),
axis.title.y = element_blank(),
panel.border = element_blank(),
panel.grid=element_blank(),
axis.ticks = element_blank(),
)
agender = paste("agender", round(gender_df$percent[[1]], 2), "%", sep=" ")
female = paste("female", round(gender_df$percent[[2]], 2), "%", sep=" ")
gender_queer = paste("gender queer", round(gender_df$percent[[3]], 2), "%", sep=" ")
male = paste("male", round(gender_df$percent[[4]], 2), "%", sep=" ")
trans = paste("trans", round(gender_df$percent[[5]], 2), "%", sep=" ")
labels = c(agender, female, gender_queer, male, trans, labels)
ggplot(gender_df, aes(x="", y=percent, fill=Gender))+
geom_bar(width = 1, stat = "identity") + coord_polar("y", start=0) + blank_theme +
scale_fill_brewer(palette="Paired", labels=labels) + theme(axis.text.x=element_blank()) +
guides(fill=guide_legend(title="Gender")) + theme(plot.title = element_text(hjust = 0.5), text = element_text(size = 30))
```
Column {data-width=550}
-----------------------------------------------------------------------
### Median Amount Spent per Month Programming (USD)
```{r, fig.height = 5, fig.width=15}
library(tidyverse)
df <- survey_df %>% group_by(CountryLive) %>% count() %>% arrange(desc(n))
survey_df$MonthsProgramming[is.na(survey_df$MonthsProgramming)] <- 0
survey_df$MoneyForLearning[is.na(survey_df$MoneyForLearning)] <- 0
learning_money_df <- survey_df %>% filter(CountryLive %in% countries, MoneyForLearning > 0, MonthsProgramming > 0) %>%
mutate(money_per_month = MoneyForLearning / MonthsProgramming) %>%
filter(money_per_month <= 10000)
learning_money <- learning_money_df %>%
group_by(CountryLive) %>% summarize(med_money_spent = median(money_per_month, na.rm=TRUE)) %>%
arrange(desc(med_money_spent))
ggplot(learning_money, aes(x=CountryLive, y=med_money_spent, cex=3)) + geom_bar(stat="identity", fill="steelblue") + ylab("USD") + coord_flip() + xlab("") +
theme(legend.position="None", plot.title = element_text(hjust = 0.5), text = element_text(size = 22))
```
### Age Distribution
```{r, fig.height = 5, fig.width = 15}
learning_money_df$Age[is.na(learning_money_df$Age)] <- 0
age_df <- learning_money_df[learning_money_df$Age > 0,]
#which.max(ggplot_build(p)$data[[1]]$count)
ggplot(age_df, aes(x=Age)) + geom_histogram(bins=30, color="black", fill="grey") + geom_vline(xintercept = 25, color="red") +
xlab("Age") + ylab("") + theme(plot.title = element_text(hjust = 0.5) ,text=element_text(size=22))
```